skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhang, Xiangliang"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 1, 2026
  2. Fair influence maximization in networks has been actively studied to ensure equity in fields like viral marketing and public health. Existing studies often assume an offline setting, meaning that the learner identifies a set of seed nodes with known per-edge activation probabilities. In this paper, we study the problem of fair online influence maximization, i.e., without knowing the ground-truth activation probabilities. The learner in this problem aims to maximally propagate the information among demographic groups, while interactively selecting seed nodes and observing the activation feedback on the fly. We propose Fair Online Influence Maximization (FOIM) framework that can solve the online influence maximization problem under a wide range of fairness notions. Given a fairness notion, FOIM solves the problem with a combinatorial multi-armed bandit algorithm for balancing exploration-exploitation and an offline fair influence maximization oracle for seed nodes selection. FOIM enjoys sublinear regret when the fairness notion satisfies two mild conditions, i.e., monotonicity and bounded smoothness. Our analyses show that common fairness notions, including maximin fairness, diversity fairness, and welfare function, all satisfy the condition, and we prove the corresponding regret upper bounds under these notions. Extensive empirical evaluations on three real-world networks demonstrate the efficacy of our proposed framework. 
    more » « less
    Free, publicly-accessible full text available June 28, 2026
  3. Free, publicly-accessible full text available July 1, 2026
  4. Pretraining of NERF models on chemically related mechanisms significantly improves the performance compared to pretraining by larger, mechanistically dissimilar reaction datasets. 
    more » « less
    Free, publicly-accessible full text available May 14, 2026
  5. Free, publicly-accessible full text available March 10, 2026
  6. Chemical reaction data has existed and still largely exists in unstructured forms. But curating such information into datasets suitable for tasks such as yield and reaction outcome prediction is impractical via manual curation and not possible to automate through programmatic means alone. Large language models (LLMs) have emerged as potent tools, showcasing remarkable capabilities in processing textual information and therefore could be extremely useful in automating this process. To address the challenge of unstructured data, we manually curated a dataset of structured chemical reaction data to fine-tune and evaluate LLMs. We propose a paradigm that leverages prompt-tuning, fine-tuning techniques, and a verifier to check the extracted information. We evaluate the capabilities of various LLMs, including LLAMA-2 and GPT models with different parameter counts, on the data extraction task. Our results show that prompt tuning of GPT-4 yields the best accuracy and evaluation results. Fine-tuning LLAMA-2 models with hundreds of samples does enable them and organize scientific material according to user-defined schemas better though. This workflow shows an adaptable approach for chemical reaction data extraction but also highlights the challenges associated with nuance in chemical information. We open-sourced our code at GitHub. 
    more » « less
  7. Inverse molecular generation is an essential task for drug discovery, and generative models offer a very promising avenue, especially when diffusion models are used. Despite their great success, existing methods are inherently limited by the lack of a semantic latent space that can not be navigated and perform targeted exploration to generate molecules with desired properties. Here, we present a property-guided diffusion model for generating desired molecules, which incorporates a sophisticated diffusion process capturing intricate interactions of nodes and edges within molecular graphs and leverages a time-dependent molecular property classifier to integrate desired properties into the diffusion sampling process. Furthermore, we extend our model to a multi-property-guided paradigm. Experimental results underscore the competitiveness of our approach in molecular generation, highlighting its superiority in generating desired molecules without the need for additional optimization steps. 
    more » « less
  8. Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science. In particular, it encodes molecules as numerical vectors preserving the molecular structures and features, on top of which the downstream tasks (e.g., property prediction) can be performed. Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning. In this survey, we systematically review these graph-based molecular representation techniques, especially the methods incorporating chemical domain knowledge. Specifically, we first introduce the features of 2D and 3D molecular graphs. Then we summarize and categorize MRL methods into three groups based on their input. Furthermore, we discuss some typical chemical applications supported by MRL. To facilitate studies in this fast-developing area, we also list the benchmarks and commonly used datasets in the paper. Finally, we share our thoughts on future research directions. 
    more » « less